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and its possible fluctuation range at a given confidence level. The proposed model characterizes in insufficient 
sample training and uncertainty analysis and is greatly suitable to most of wind farms in China (newly built or 
large scale wind farms). First, a grouping mechanism has been used to divide wind turbines into several groups 
Keywords: to establish forecasting model separately. Second, a selection method properly taking the characteristics of 
ama ae NWP error distribution into consideration was presented to improve forecasting accuracy of each group. Third, 
TOUDINE TOrecasts: the parameters of the kernel function and initial value of iteration are determined by particle swarm 
Uncertainty analysis alae R f E ; A : 
ae : optimization to further enhance forecasting accuracy. Two wind farms in China are involved in the process of 
Optimized relevance vector machine . a A g 2 
primary data collection. The performance data obtained from ORVM models are tested against the predicted 
data generated by GA-ANN and SVM. Results show that the proposed model has better prediction accuracy, 
wider application scope and more efficient calculation. 
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1. Introduction significant attention in many countries worldwide. In fact, wind 
power is the fastest growing source of renewable energy [1]. 
As deterioration of environment, exhaustion of fossil-fuel and However, variable nature of wind energy will possibly put the 
increasing demand for electricity, wind power has been attracting reliability, stability and power quality of the electricity power 

system at risk [2]. 

One of the most essential measures to mitigate serious 
-= Corresponding author. Mob.: +86 159 116 8089. influence from integration of wind farms is the short-term wind 
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Nomenclature 


ANN Artificial neural network 

SVM Support vector machine 

RVM Relevance vector machine 

ORVM Optimized relevance vector machine 


NWP Numerical weather prediction 
GA Genetic algorithm 
PSO Particle swarm optimization 


GA-ANN ANN optimized by genetic algorithm 
SOFM _ Self-organizing feature mapping 
WT Wind turbine 


allows: (1) reasonable maintenance schedules to be established 
and minimum spinning reserve capacities to be determined so as 
to reduce operating costs; (2) proportion of wind power in the 
electric system to be increased; (3) competitiveness of wind power 
companies to be improved in competitive bidding markets [3,4]. 

There are many commonly used wind power prediction methods 
including the physical methods [43,44] (analytical method and CFD 
method) and the statistical methods (such as artificial neural network 
[5-8], fuzzy logic [34,40], the Cao algorithm [38], support vector 
machine [9-11] and ensemble method [39]). And the above prediction 
models have been applied for facilitating the economical maintenance 
schedules, competitive bidding markets [3,4] and unit commitment 
[41,42]. However, they suffer some disadvantages as well. As for the 
physic method, the analytical method is hard to meet the precision 
requirement, and the key problem for CFD method is the computa- 
tional burden. Among the statistical methods, the most widely used 
ones are ANN in terms of its generalization ability of prediction. Since 
ANN can theoretically approximate any nonlinear continuous function, 
it has been successfully applied to the wind power prediction. 
However, the performance of the ANN based model is sensitive to 
the size of training samples [37] and only minimizing the training 
error of a neural network may lead to over-fitting problem [12]. The 
consequence is that for the known inputs prediction error is very small 
while for the unknown inputs out of samples the prediction error 
surges. And this is termed as limited generalization [13]. To remedy 
this problem, to increase the number of training samples is one way of 
improving the generalization performance of ANN. However, this 
demands for large amount of training samples which in turn limits 
the application of ANN model. For instance, it is difficult for newly- 
built wind farms having insufficient historical data to build prediction 
model because of their short running time. Furthermore, the shortage 
of historical data will undoubtedly increase the difficulty of training 
prediction model according to weather variation of different months 
or different seasons. 

For the purpose of enhancing generalization ability without 
size requirement of training samples, a statistical learning tech- 
nology SVM has been applied in wind power prediction. The SVM 
employs a linear function in high-dimensional feature space as 
hypothesis space and makes good predictions using small training 
samples. This is a highly effective mechanism for avoiding over- 
fitting. However, despite its success, we can identify some sig- 
nificant and practical disadvantages of SVM [14]: 


1) The kernel function must satisfy Mercer’s condition. That is, it 
must be the continuous symmetric kernel of a positive integral 
operator; 

2) Only a single point estimate can be achieved without any 
uncertainty information; 

3) Although relatively sparse, the number of support vectors 
grows linearly along with the increase of training samples 
size, which increases the computational complexity; 

4) It is necessary to estimate some insensitive parameters which 
generally entail extra calculation and setting of parameters. 


To overcome above drawbacks, a probabilistic learning frame- 
work termed relevance vector machine (RVM) has been originally 


introduced by Tipping | 14]. RVM is a nonlinear pattern recognition 
model with simple structure based on Bayesian Theory and 
Marginal Likelihood. The key feature of RVM is that as well as 
offering excellent performance of prediction and generalization, it 
improves the inadequacy of SVM | 15-17]. Therefore, this approach 
has been successfully applied in many fields, such as: load 
forecasting, fault classification [18-21], but has not yet been 
applied to wind power prediction. 

In addition to advanced mathematics, different strategies have 
been developed to improve the forecasts accuracy. Many research- 
ers proved the existence of smoothing effect which means that the 
overall wind power fluctuations would decrease because of the 
offset of different wind resources in a large area [24,25]. This effect 
would grow with the increase of wind farm area and could be 
employed to manage the electricity quality and fluctuations of 
wind farm output [26-30]. In fact, this effect would happen when 
forecasts the power output of a single wind farm, because the 
forecast errors of each wind turbine which locates at different sites 
could offset with each other and then reduce the whole forecast 
error of wind farm [31,32]. However, the computational costs 
would surge if the output of each wind turbine is predicted, 
especially when NWP of each wind turbine site is needed. There- 
fore, a grouping method has been developed to divide wind 
turbines into several groups considering the factors of wind speed 
correlation, wind power correlation, wind turbine sites. The 
forecasting model is established for each group and the forecasts 
of wind farm output is derived by adding results of each group 
together. 

Considering smoothing effects, features of numerical weather 
prediction (NWP) error and historical data of wind farms and 
nature of RVM mentioned above, it is not suitable to apply a raw 
RVM model directly in wind power prediction. Therefore, an 
optimized relevance vector machine based method (ORVM), a 
combination of RVM, grouping method and selection method for 
training samples and particle swarm optimization (PSO) are used 
to predict wind farm output of each month. 

In this paper, Section 2 describes the theory of RVM. Section 3 
describes ORVM wind power forecasting model. To verify the 
effectiveness and superiority of the ORVM model, Section 4 
presents a case study with two wind farms comparing the 
performance of ORVM, SVM and GA-ANN. Section 5 includes the 
final conclusions. 


2. Theory of relevance vector machine [14] 


Given a set of input-target pairs {Xn,tn}N_,, assume that 
ti = (Xj; W) + £i. ei is assumed to be mean-zero Gaussian with 
variance o°, that is N(O,c?). Kernel function K(x,x;) has been 
considered in RVM which makes prediction by the function: 


y(x; @) = W! G(x) = 5 wjK(x, Xi) + Wo (1) 
i=1 


where, @(x) is vector of basis function; w = (w1, W2, ..., Wm) is 


weights vector. 
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Therefore, the probabilistic formulation of RVM Model is 
defined as 


P(tnX) = N(tnly(Xn), 0°) (2) 


where, N represents a Gaussian distribution over t, with mean of 
y(Xn) and variance a°. And the definition of Y(Xn) is the same as 
function (1). 

The likelihood function of whole samples is defined as follow: 


p(t\w, o>) = (202) N2 @l-0/20?)\\t-ow") (3) 


To overcome over-fitting from implement of maximum- 
likelihood estimation for w and o?, constraint on weights w; 
was imposed, that is ‘prior’ probability distribution as follow: 


N 
p(wia)= JI N(wil0, a ') (4) 
i=0 


where, a is N+1 vector termed ‘hyperparameters’. 

The posterior over unknown samples could be obtained from 
proceeds of Bayes inference. 

t|w, a, o? W, a, 0? 
pwa ei = Pel „a, 0°) x P(W, a, 0°) (5) 
p(t) 

Assuming that new test target is tą, new test input x, are 
used to make predictions. Then, predictive distribution can be 
written as: 


plt) = J P(ty|W, a, 07) p(w, a, o?|t)dwdada? (6) 
Posterior distribution over weighs could be consequently 
rewritten as: 


p(t|w, o?) x p(wia) 
D(t\a, a?) 


pwit, a, 07) = (7) 
Therefore, learning process of RVM becomes the search for 

a and o? which makes the maximization of p(a, o7|t)«p(t\a, o?) 

p(a)p(o) by maximum marginal likelihood estimation methods 


piao?) =f pie|w,o?)pcw|aydw 


= (22) |o] + BA TU exp \- Stl + oaan eh 
(8) 


It can compute a, o? by equating the differentiation over a, a? 
of function (8) to zero. When a; approaches extremely large, w; 
goes to zero because of constrain by the prior. For w; interrelated 
with small a; it fits sample data better. Iteration would be 
preceded until the convergence condition is fulfilled. During the 
process of parameters estimation, most of a;—oo, where corre- 
sponding w; =O. It leads to non-participation of prediction calcu- 
lation for many terms of kernel matrix. This is why RVM could 
achieve sparsity. 

Iterative estimation of hyperparameters proceeds to make 
predictions based on each weigh of posterior distribution which 
adjusts to maximizing values amp, o2,». With new inputs Xx, pre- 
dictive results could be described as follow: 


p(tslt, amp, ofp) = / P(talW, ofp) P(WIt, amp, oiyp)dw = N(tslVe. 02) 


(9) 
where, 
Va. = UT O(Xx) (10) 
02 = ofp + D(X)! E D(Xx) (11) 


3. RVM-based wind power interval forecasting model 
3.1. Model structure 


Structure of the ORVM wind power grouping forecast model is 
illustrated in Fig. 1, composed of a grouping engine using SOFM 
method, a selection engine for training samples and a PSO engine 
for optimizing parameters of kernel function and a RVM engine for 
forecasting and its confidential interval. 


3.2. Grouping of wind turbines 


Wind power production is directly correlated to the wind speed 
through a power curve. Other atmospheric factors, such as wind 
direction, pressure, temperature, relative humidity, also have 
impact on the actual power output [22,23]. In this paper, single 
NWP results are adopted as inputs of ORVM prediction model, and 
the reference site of this single NWP is at met mast. Although NWP 
has already well applied in wind power forecasting, there are also 
some disadvantages such as representativeness of single reference 
site, huge computation, low computational resolution etc [33-35]. 
Current practice is to calculate NWP at met mast to represent the 
wind profile of the whole wind farm which impacts the accuracy 
of wind power forecasting. Especially with the increase of 
wind farm size, the representativeness of a single met mast is 


Grouping Engine 
for 
wind turbines 


Group 2: NWP2# 


Selection Engine 


Group 1: NWP1# 


Selection Engine 


Group n: NWPn# 


Selection Engine 


for for for 
training samples 


training samples training samples 


PSO Engine PSO Engine PSO Engine 
for for for 
model parameters model parameters model parameters 


RVM Engine RVM Engine RVM Engine 
for for for 
power forecasts 1# power forecasts 2# power forecasts n# 


Interval forecasts for wind farm 


Fig. 1. Structure of the ORVM grouping forecast model. 


Wind speed correlation | | Wind power correlation WT Site 
Grouping Engine 


Selection of NWPreference point 


WTAltitude 


Fig. 2. Structure of the grouping engine. 
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accordingly weakening. The more the reference sites are selected 
to predict the weather, the higher the forecasting accuracy might 
be achieved but the larger computational burden would sequen- 
tially bring in. To balance the practical application and forecasting 
accuracy, a grouping model has been proposed based on a 
clustering method—SOFM [36] to identify similarity of wind speed, 
wind turbine generating characteristic, wind turbine site. As 
shown in Fig. 1, grouping engine would divide wind turbines into 
n groups and every group would have its own reference site for 
NWP. Based on NWP of each group, the following engines would 
subsequently continue to conduct. Note that, the wind turbine 
which suffers less influence from wake effect, topography in the 
same group would be selected as reference site of NWP. For 
example, the wind turbine locates at higher altitude or locates 
relatively far from other wind turbines. 

The structure of grouping engine is shown in Fig. 2. Wind speed 
correlations, wind power correlation of each wind turbine help the 
engine put the correlated wind turbines into the same group. 
Meanwhile site of each wind turbine brings the geographic 
similarity into the grouping engine. After grouping, the represen- 
tativeness of wind turbine in its own group would be evaluated by 
altitude and distance to wind farm border. To take a wind farm in 
north China (named as WF1) as example, the results of grouping 
engine and NWP reference point for each group are shown in 
Table 1. 


3.3. Selection of training samples 


Figs. 3 and 4 present accuracy of wind speed from NWP whose 
the reference site is at met mast in two wind farms (WF1 and 
WF2). Observe that correlation coefficient and root mean square 
error of NWP wind speed forecasts are not stable and fluctuate 
with the changes of months or seasons. In Fig. 3, forecasts accuracy 
of NWP normally peaks in winter (Dec.—Mar.), drops to lowest in 
spring (Apr.-June), following a modest rise in summer and fall 
(July-Nov.), but even not reaching the maximum in winter. In 
Fig. 4, the NWP accuracy shows very similar features: the accuracy 
peaks in winter and summer and drops to its lowest in autumn 
and spring. That is because that there are some months or seasons 
whose weather changes are relatively stable and display strong 
regularity. For those meteorological phenomena fitting into our 
known regularity, its NWP could be more accurate, otherwise it is 
hard to predict precisely for complex weather variation. 


Table 1 
Results of grouping engine. 


Group Reference site of NWP WT in the group 
1 WT 58# 16,17,20,55,56,58 
2 WT 18# 15,18,69,70,76 
3 WT 21# 14,19,21,57 
4 WT 24# 23-27,32-36,44,45,54 
5 WT 63# 63 
6 WT 67# 62,64,65,67,68,74,75 
7 WT 59# 22,59,60,82 
8 WT 116# 40,116,119 
9 WT 43# 28-31,37-39,41-43,46-49,53,112,114 
10 WT 12# 12 
11 WT 73# 71,73 
12 WT 87# 61,80,85,87 
13 WT 102# 101,102 
14 WT 52# 51-52,103,107,109-111,113,115,117-122 
15 WT 72# 66,72,77-79,81,83,84,86 
16 WT 99# 88-100 
17 WT 108# 9,108 
18 WT 8# 1-8,10,11,13 


Note: Reference site of NWP indicates the location on which the NWP data are 
calculated. 
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Fig. 3. Accuracy of wind speed from NWP for each month in WF1. 
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Fig. 4. Accuracy of wind speed from NWP for each month in WF2. 


As weather forecast within wind farms is the key of wind 
power prediction technology and the NWP accuracy shows clear 
monthly or seasonal characteristic, it is possible to improve the 
accuracy of wind power forecasting if models of each month are 
built. Furthermore, RVM has the advantages of demand of less 
training samples, which facilitates the model training for each 
month. Thus, candidate NWP samples of each month are trans- 
ferred to the selection engine as model inputs and then to build 
prediction models according to different error distribution of NWP. 

Since RVM is an intelligent learning method with adaption of 
small training samples, it is necessary and feasible to select the 
most beneficial training data for accurate prediction. On the one 
hand, selection of training samples exerts a significant impact on 
forecast accuracy. A small number of training samples or samples 
with large deviation usually causes that forecast engine learns 
mapping function incorrectly while a huge amount of training 
samples may be misleading for the forecast engine. On the other 
hand, data in a month from wind farms are far more than the 
required amount of training samples for RVM, because data 
resolution of wind farms is normally no more than 15 min, and 
there are theoretically around 2880 sets of data per month. 
Therefore, how to select the most effective NWP data sets as the 
training samples is the key step in predictive modeling. 

To test WF1 (January 2010) and WF2 (January 2011), the 
accuracy of NWP is classified according to absolute error of NWP 
wind speed. For instance, < 1.2 m/s means higher NWP accuracy 
than <1.8 m/s because only samples whose absolute error of 
NWP wind speed is less than 1.2 m/s rather than 1.8 m/s are 
selected as training samples. It starts with a low accuracy level of 
NWP and then gradually increases it. As shown in Fig. 5, two lines 
show similar trend meaning that the accuracy of forecasting model 
increases with the rise of NWP accuracy at first. That is because 
that with improvement of NWP forecasts, training samples might 
simulate actual power curve of wind turbines more accurately so 
as to achieve higher wind power forecasting accuracy. In this case 
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of WF1, forecasting accuracy peaks at the point of absolute error of 
NWP wind speed < 1.5 m/s, while it is <1.6 m/s in WF2. Then, 
forecasting accuracy declines from the highest value with further 
descent of NWP Error. Because with further increase of NWP 
accuracy, elimination of too much data leads to too small training 
samples so that deteriorate both the generalization and fitness of 
forecasting model as well as forecast accuracy. 

This selection engine was implemented with candidate sam- 
ples of each month in two wind farms and the results were 
recorded in Table 2. Forecast accuracy of each month reflects 
similar features as that of January. Besides, accuracy peak of 
forecasting model for different months locates at different abso- 
lute error of NWP wind speed (or NWP accuracy level). Consider- 
ing using samples corresponding to different accuracy level of 
NWP results in different accuracy of forecasting model, selection of 
training samples according to NWP accuracy level may improve 
the forecast accuracy. 


3.4. Optimization of kernel function 


Due to the significant impact of kernel function parameters on 
forecasting accuracy, particle swarm optimization (PSO) has been 
adopted to search for the optimal kernel width and initial value of 
RVM. In this paper, Gaussian kernel function as (12) is adopted. 


K ( =} 
(X, Xj) = exp | —-—~— 


202 


(12) 


where, o is the width of kernel function. 
There are 20 particles for each parameter used in this model 
and the adaptive function of them is root mean square error. The 

speed and location of them are updated by following functions: 
k 


vit! = ový + cyrand()(pbjg—Xkq) + Co rand()(gb;4—xk,) (13) 


k+1 _ yk k+1 
Xid = Xid t Vid (14) 


where, cı and c2 are learning factors; rand() is uniform random 
number [0, 1]; vk and xk, are speed and location of the ith 
particles in the kth iteration in d-demention; pbk, and gbia are, 
respectively, the individual best location and group best location of 
the ith particle in d-demention; œw is inertia weight factor. 

Along with the picked training samples of one month and its 
corresponding actual power data, optimized parameters from PSO 
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Fig. 5. Forecasts accuracy corresponding to different NWP error level. 


Table 2 


engine are transferred to RVM engine whose structure is illu- 
strated in Fig. 6. 


3.5. Steps of grouping forecasts 


Application of the ORVM model for wind power prediction can 
be summarized as following steps: 


(1) Wind turbine grouping phase: 
Assume that M is the size of input samples; N denotes the 
size of the number of neural cell in output layer; k 
represents the neural cell number in input layer; t is the 
current number of learning times; T is the total learning 
times. 

a. To calculate the correlation of wind speed and wind 
power output of each wind turbine; 

b. To input the wind turbine sites and the correlation of 
wind speed and wind power into grouping engine as 
input vectors and to normalize the input vectors 
among the range of [0,1] as X = {Xi; Xi = (Xi, Xj2. ..-, 
Xin) ERK, i= 1,2, ...,M)} 

c. To initialize the SOFM based grouping engine: 
connection weights W={w;; wj=(Wj1, Wj,..., 
Wji, «+s Wik) € RK j=1,2,..., N}; learning pace a(t); 
neural cell neighborhood Na;(t); 

d. To calculate the distance between connection weight 
of neural cell and inputs: 


k 1/2 
Edj = a owe? | »J=1,2,... (15) 
i= 


e. To search the winning neural cell which has the 
shortest distance: 


f. To adjust the network parameters (connection 
weights of the winning neural cell with other neural 
cell) in order to mapping the input vectors into the 


— p 


Fig. 6. Structure of raw RVM engine. 


NWP accuracy level for the training sample selection of each month in two wind farms. 


(m/s) January February March April May June 
WF1 <1.5 < 1.0 <1.5 <15 <16 <20 
WF2 < 1.6 <1.5 < 1.7 <19 <16 <20 


July August September October November December 
<2.0 <16 <12 <1.7 <1.5 
<1.2 <12 <5 <16 <16 <17 


618 J. Yan et al. / Renewable and Sustainable Energy Reviews 27 (2013) 613-621 


output layer: 
w(t + 1)=w((t) + a(D[x()—w,(t)], jENaj.(0) (17) 


g. To iterate every sample and to update the learning 
pace until t=T: 


a(t) = a(0)(1- 7) (18) 


h. Each input vector would non-linearly mapping to 
their winning neural cell in output layer. The vectors 
mapping to the same winning neural cell belong to 
the same category. 

(2) Training samples selection phase: 
To classify the NWP accuracy level and to input the overall 
candidate samples of each month into selection engine to 
determine the most effective level. Samples selected by the 
most effective level were taken as training samples. Then, 
to normalize selected samples into range of 0 to 1 for the 
convenience of computation. 

(3) Model parameters optimization phase: 
To transfer the selected training samples from phase (1) to 
PSO engine to determine the most suitable kernel width 
and initial value of RVM iteration. 

(4) Training and forecasting phase: 

a. To input selected training samples and optimized 
parameters into RVM engine and to initialize itera- 
tion conditions; 

b. To calculate the posterior distribution over weights: 


p(wit, a, o?) = (22) NH2 F -A/2Del-C1/2w—w Eww) 
(19) 


c. To update the mean and variance of posterior 
pw and Z, respectively: 


p=oa7Ze't (20) 


S=(¢7Q'@+A)! (21) 


d. To precede iteration of function (22) and (23) until 
the convergence condition is fulfilled: 

anew _ baal (22) 

Hj 

(o2)new = t-Op? (23) 
N—Zi(1—aXii) 
where, n; is the ith mean of posterior from function 
(20); X; is the ith diagonal element of posterior 
covariance from function (21), and computed by 
a,o” from current iteration results; N indicates the 
number of sample data. 

e. To delete those w;=0 in the iteration process. 
The vector corresponding to remaining w; is termed 
as ‘Relevance Vector’. Consequently, RVM largely 


Table 3 
Wind farm description. 


WF1 WF2 
Installed capacity 183 MW 201 MW 
Running period 2010 year except for October 2011.5-2012.6 


WT number 122 134 
Location Northeast China East-central China 


reduces model complexity and computing costs. 
Model parameters amp, o%,p would be achieved 
when model training is completed. 

f. To import test data and model parameters ayp, oop 
into forecasting model. After calculation and anti- 
normalization process, prediction value and its pos- 
sible fluctuation are obtained. 


Not only could ORVM model provide an individual prediction 
value, but also calculate variance of the prediction value by 
function (11) which is also the possible prediction error. Term of 
variance comprises two kinds of error: one is from error of 
estimation while another is from uncertainty of weights calcula- 
tion. This probabilistic mechanism for forecasting largely improves 
practical value for risk-resisting | 14]. 


4. Case study 
4.1. Data 


The data of two wind farms in China include mean wind farm 
output collected from SCADA and mean wind speed from met 
mast and numerical weather prediction data. All the data had an 
interval period of 15 min covering running period shown in 
Table 3. In WF1, wind speed data measured by each wind turbine 
SCADA are available while it is not available in WF2. It means that 
grouping engine could only be tested in WF1. Among the available 
data, 80% are considered as candidate training samples and 20% 
test samples. 

To evaluate the performance of proposed model, ORVM models 
are compared with support vector machine (SVM) and ANN 
optimized by Genetic Algorithm (GA-ANN) in terms of forecast 
accuracy, model complexity and model running time. For the sake 
of fairness, all methods have the same input variables, training 
samples and test samples. Note that ANN’s training samples 
contains 12 sets of ORVM training samples as a whole due to its 
demand of a large number of training samples which means only 
one model established. 


4.2. Evaluation 


Two frequently used error criteria are adopted for numerical 
experiments of this paper. Root mean square error (RMSE) in 
function (24) is computed for all validation period and can give a 
better evaluation of prediction error over a longer period [12]. 
Mean absolute error (MAE) in function (25) is another commonly 
used error measures for prediction process. 


[5n _(Py);—Pp;)? 
RMSE — Die 16 Mi Pi) (24) 


Cap x yn 
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Cap xn 


MAE = (25) 
where Py; and Pp; indicate actual and forecast value of wind power 
output at time of i; Cap denotes installed capacity of wind farm; n 
is number of samples involved. 


Table 4 

RMSE comparison of single NWP forecasts and grouping forecasts in WF1. 
ORVM (%) SVM (%) GA-ANN (%) 

Single NWP 9.9 12.5 13.3 

Grouping forecasts 9.1 11.4 12.1 
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Table 5 
Comparisons of forecasts accuracy for each month in WF1 (forecasts with grouping 
engine). 


Month ORVM model SVM model GA-ANN model 

RMSE MAE RMSE MAE RMSE MAE 
Average 0.091 0.059 0.114 0.082 0.121 0.091 
January 0.109 0.077 0.112 0.076 0.127 0.094 
February 0.103 0.061 0.126 0.093 0.124 0.099 
March 0.077 0.049 0.121 0.096 0.091 0.061 
April 0.137 0.100 0.139 0.117 0.133 0.090 
May 0.103 0.071 0.175 0.145 0.149 0.115 
June 0.063 0.035 0.056 0.035 0.090 0.069 
July 0.071 0.044 0.071 0.041 0.112 0.084 
August 0.050 0.041 0.064 0.032 0.089 0.06 
September 0.086 0.053 0.098 0.056 0.143 0.103 
November 0.101 0.054 0.170 0.121 0.151 0.138 
December 0.098 0.072 0.123 0.09 0.120 0.092 

Table 6 


Comparisons of forecasts accuracy for each month in WF2 (forecasts without 
grouping engine). 


Month ORVM model SVM model GA-ANN model 
RMSE MAE RMSE MAE RMSE MAE 
Average 0.119 0.092 0.144 0.104 0.142 0.106 
January 0.140 0.090 0.147 0.092 0.159 0.101 
February 0.137 0.091 0.141 0.097 0.150 0.119 
March 0.142 0.120 0.149 0.125 0.147 0.125 
April 0.151 0.134 0.177 0.1403 0.162 0.127 
May 0.169 0.146 0.174 0.157 0.185 0.151 
June 0.142 0.111 0.155 0.107 0.169 0.140 
July 0.083 0.05 0.141 0.084 0.128 0.079 
August 0.098 0.073 0.169 0.108 0.148 0.097 
September 0.101 0.081 0.128 0.091 0.114 0.087 
October 0.053 0.038 0.076 0.050 0.075 0.051 
November 0.101 0.070 0.152 0.111 0.148 0.107 
December 0.113 0.103 0.125 0.095 0.127 0.097 


4.3. Analysis and discussion 


Table 4 shows the grouping forecasts of three models to verify 
the effectiveness of grouping engine. Single NWP represents the 
forecasting using only one set of NWP data at met mast while 
grouping forecasting indicates the forecasting using several sets of 
NWP data at reference sites of each group in Table 1. The results 
show that the grouping engine plays positive role on three 
different forecasting methods, and improve the yearly average 
RMSE by (9.9 —9.1%)/9.9%=8.08% for ORVM, (12.5 — 11.4%)/12.5%= 
8.8% for SVM, (13.3 — 12.1%)/13.3%=9.02% for GA-ANN. 

Tables 5 and 6 show the full-year forecasts accuracy of ORVM, 
SVM and GA-ANN in WF1 and WF2. Because of the data manage- 
ment in WF2, wind speed data measured by wind turbines are 
unavailable. Consequently, the grouping engine could be tested in 
WF1 but not in WF2. That is one of the reasons why the performance 
of ORVM in WF1 is better than that of in WF2. Moreover, a little 
worse NWP quality in WF2 shown in Figs. 3 and 4 might be another 
reason for larger forecasting RMSE. In general, RMSE and MAE 
defined in functions (24) and (25) of ORVM are considerably lower 
than those of SVM and GA-ANN. Moreover, the average RMSE and 
average MAE are less than those of SVM by about (11.9—9.2%)/ 
11.9%=22.68% and (8.2 —5.9%)/8.2%= 28.04% in WF1; (14.4— 11.9%)/ 
11.4%=21.92% and (10.4—9.2%)/10.4%=11.53% in WF2; revealing 
capability of ORVM model in wind power prediction. 

To better illustrate the forecast trend and probabilistic forecast 
capability of the proposed model, forecasting results of four days 


Table 7 
Comparisons of computing time and vector number for each model. 


Model Training Test Number of 
time (s) time (s) vectors involved 

WF1 ORVM 192.136 52.190 92.52 

SVM 180.119 67.529 116.39 

GA-ANN 1286.342 109.327 
WF2 ORVM 15.354 0.962 69.25 

SVM 12.008 1.003 93.06 

GA-ANN 513.452 1.795 


from different seasons are presented as examples of each wind 
farm. Figs. 7 and 8 show the predictive and real value of wind farm 
output as well as the range of possible fluctuations at 90% 
confidence level. ORVM features in providing not only a certain 
predictive value but also its uncertainty analysis, both the upper 
and lower limits of power fluctuation. This can provide more 
scientific guidance of risk decision for both wind farm operators 
and electric system dispatchers. 

Validation test of uncertainty analysis: throughout the year, the 
percentage of which real power production locates within the 
fluctuation range is 89.928% on average in WF1, 88.31% in WF2 at 
90% confidence level. It proves that ORVM model can quantita- 
tively assess the wind power prediction uncertainty. 

To evaluate efficiency of the proposed model, its computing 
time (containing training and test time) and vector number are 
presented in Table 7 as well as those of SVM and GA-ANN. Less 
computational cost and few involved vectors reflect its efficient 
learning capacity and simple model structure. The running time, 
measured on a simple hardware set of 2.79 GHz Processor with 
3.12 GB RAM, is completely acceptable for decision-making of a 
day-ahead dispatching or even for ultra-short term operating. 
Note that the training and running time of WF1 largely outnumber 
those of WF2 because the test in WF1 involves grouping engine 
meaning that more sets of NWP data and more forecasting models 
join in the calculation process. 


5. Conclusion 


1. ORVM models of each month for wind power grouping prediction 
have been proposed. Results of the case study involving two wind 
farms in China prove that full-year average RMSE and MAE of the 
proposed model are 9.1% and 5.9% which are lower than those of 
SVM and GA-ANN, respectively. Furthermore, the proposed 
model effectively provides quantitative assessment of forecasting 
uncertainty. ORVM model outperforms SVM and GA-ANN in 
terms of the wind power forecast accuracy and practicality. 

2. A grouping engine has been established to divide wind turbines 
into several groups for improving forecasting accuracy and to 
minimize NWP computational cost with the smoothing effect. 
The grouping engine can identify the similarity of distinctive 
wind speeds, WT power outputs and wind turbine sites at 
different locations. And then a NWP reference site of each 
group will be selected to represent the general condition of the 
wind resource of each group. This will help enhance the 
representativeness of NWP data in a certain area so as to 
improve forecasting performance. 

3. A method for selection of training samples is presented con- 
sidering instability and seasonal characteristic of NWP accuracy 
as well as requirement of small training samples of RVM. This 
method makes forecasting models more suitable to NWP 
characteristics of each month so as to significantly improve 
forecast accuracy. Besides, PSO has been applied to search the 
appropriate model parameters for different samples. 
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Fig. 7. Probabilistic forecast results of WF1 in 24th May (a), 29th July (b), 25th Sep. (c), 26th Dec. (d). 
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Fig. 8. Probabilistic forecast results of WF2 in 15th Feb. (a), 20th April (b), 15th July (c) and 17th Dec. (d). 
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4. The merits of RVM-based model for wind power prediction are 
as follows: 

© Diverse selection of kernel function improves model adapt- 
ability: There is no necessity to satisfy Mercer’s condition 
for kernel function. Thus, RVM-based model could more 
precisely simulate power output of different wind farms in a 
wider scope; 

© Probabilistic prediction: RVM-based model provides fluc- 
tuation range of prediction at given confidence level rather 
than a certain value of wind power prediction; 

è Fewer samples required in training process: On the one 
hand, due to facilitation of building prediction models of 
each month, it reflects characteristic of NWP accuracy 
distribution more properly and then enhances prediction 
capacity. On the other hand, it is capable of building 
prediction model for newly-built wind farms which have 
less historical data; 

© Sparsity: Most relevance vectors automatically tend to zero 
during training process. Consequently, it has much less 
vectors than that of SVM in the computation. Moreover, 
number of relevance vectors would not suffer linear 
increase along with growth of size of training samples. 
Because of the focus on vectors only relating with accurate 
prediction, model complexity is greatly reduced and train- 
ing efficiency is improved; 

e Simplified parameters setting: Different from SVM, only the 
width of kernel function being set minimizes subjective 
influence upon RVM-based model to the largest extent. 
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